منابع مشابه
Web-crawling reliability
In this article, I investigate the reliability, in the social science sense, of collecting informetric data about the World Wide Web by Web crawling. The investigation includes a critical examination of the practice of Web crawling and contrasts the results of content crawling with the results of link crawling. It is shown that Web crawling by search engines is intentionally biased and selectiv...
متن کاملCrawling the Infinite Web
A large amount of the publicly available Web pages is generated dynamically upon request, and contain links to other dynamically generated pages. Many Web sites that are built with dynamic pages can create arbitrarily many pages. This poses a problem for the crawlers of Web search engines, as the network and storage resources required for indexing Web pages are neither infinite nor free. In thi...
متن کاملHigh-Performance Web Crawling
High-performance web crawlers are an important component of many web services. For example, search services use web crawlers to populate their indices, comparison shopping engines use them to collect product and pricing information from online vendors, and the Internet Archive uses them to record a history of the Internet. The design of a high-performance crawler poses many challenges, both tec...
متن کاملCrawling the Web
The large size and the dynamic nature of the Web highlight the need for continuous support and updating of Web based information retrieval systems. Crawlers facilitate the process by following the hyperlinks in Web pages to automatically download a partial snapshot of the Web. While some systems rely on crawlers that exhaustively crawl the Web, others incorporate “focus” within their crawlers t...
متن کاملFocused Web Crawling Algorithms
Nowadays the web is rich of any kind of information. And this information is freely available thanks to the hypermedia information systems and the Internet. This information greatly influenced our lives, our lifestyle and way of thinking. A web search engine is a complex multi-level system that helps us to search the information that available on the Internet. A web crawler is one of the most i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the American Society for Information Science and Technology
سال: 2004
ISSN: 1532-2882,1532-2890
DOI: 10.1002/asi.20078